STAT321-18B (HAM)

Advanced Data Analysis

20 Points

Edit Header Content
Faculty of Computing and Mathematical Sciences
Rorohiko me ngā Pūtaiao Pāngarau
Department of Mathematics and Statistics

Staff

Edit Staff Content

Convenor(s)

Lecturer(s)

Administrator(s)

: rachael.foote@waikato.ac.nz

Placement Coordinator(s)

Tutor(s)

Student Representative(s)

Lab Technician(s)

Librarian(s)

: debby.dada@waikato.ac.nz

You can contact staff by:

  • Calling +64 7 838 4466 select option 1, then enter the extension.
  • Extensions starting with 4, 5 or 9 can also be direct dialled:
    • For extensions starting with 4: dial +64 7 838 extension.
    • For extensions starting with 5: dial +64 7 858 extension.
    • For extensions starting with 9: dial +64 7 837 extension.
Edit Staff Content

Paper Description

Edit Paper Description Content

In the first part of the paper, we will develop the theory and practice of regression modelling, which is the name given to developing statistical models which predict a variable y from one or more other variables. In STATS221, you will have met this situation in the case of linear regression where y is assumed to be normally distributed about a mean given by a linear combination of other variables. We will review linear regression using the R software and then go on to show models can be developed where the distribution of y about its modelled average value is non-normal. These are known as Generalised Linear Models (GLMs). The use of these models to analyse appropriate data sets will be taught.

In the second part of the paper, we will introduce Weka and study several topics in the area of Multivariate Statistics and Machine Learning: Classification and Discrimination; Regression using Weka; Tree Models; Principal component analysis, Multivariate distances; Unsupervised learning – hierarchical and non hierarchical; the K means clustering algorithm; and the EM algorithm.

Edit Paper Description Content

Paper Structure

Edit Paper Structure Content

The paper is broadly divided into two main topics:

  • Generalised Linear Models, taught in Weeks 1 to 6; and
  • Multivariate techniques [jointly with the students of COMP321], taught in Weeks 7 to 12.

Instruction is given in the use of the free, open-source statistical software R and Weka.

During the first six weeks, students of STAT321 will attend three lectures and one tutorial per week. During the second six weeks, STAT321 students will join COMP321 students in the COMP321 stream, attending two lectures and one lab per week. Note that the times and locations of lectures will change from the first half of the course to the second.

Edit Paper Structure Content

Learning Outcomes

Edit Learning Outcomes Content

Students who successfully complete the course should be able to:

  • identify an appropriate technique to use when analysing data within the scope of the techniques covered.
    Linked to the following assessments:
  • communicate the results from an analysis.
    Linked to the following assessments:
  • have developed skills in analysing data within the scope of the techniques covered when using R and Weka.
    Linked to the following assessments:
Edit Learning Outcomes Content
Edit Learning Outcomes Content

Assessment

Edit Assessments Content

Each half of the paper contributes half of the internal assessment.

The internal assessment will consist of:

Weeks 1-6 assessment:

2 assignments (7.5% each) and one test (10%).

Weeks 7-12 assessment:

4 tutorial exercises (3.75% each) and one assignment (10%).

Note: The internal assessment component contributes 50% to your final mark, and the final exam contributes the other 50%.

A final mark of 50% or higher is required to pass the course. However, note that if you manage to achieve more than 50% overall, but less than 20% of your final mark is from the exam, you will receive a restricted pass.

Edit Additional Assessment Information Content

Assessment Components

Edit Assessments Content

The internal assessment/exam ratio (as stated in the University Calendar) is 50:50. There is no final exam. The final exam makes up 50% of the overall mark.

The internal assessment/exam ratio (as stated in the University Calendar) is 50:50 or 0:0, whichever is more favourable for the student. The final exam makes up either 50% or 0% of the overall mark.

Component DescriptionDue Date TimePercentage of overall markSubmission MethodCompulsory
1. Part 1: Assignment 1
26 Jul 2018
11:30 PM
7.5
  • Online: Submit through Moodle
2. Part 1: Assignment 2
16 Aug 2018
11:30 PM
7.5
  • Online: Submit through Moodle
3. Test
16 Aug 2018
10:00 AM
10
  • Hand-in: In Lecture
4. Tutorial Exercises 1
3.75
5. Tutorial Exercises 2
3.75
6. Tutorial Exercises 3
3.75
7. Tutorial Exercises 4
3.75
8. Part 2: Assignment
10
9. Exam
50
Assessment Total:     100    
Failing to complete a compulsory assessment component of a paper will result in an IC grade
Edit Assessments Content

Required and Recommended Readings

Edit Required Readings Content

Recommended Readings

Edit Recommended Readings Content
  • Practical Regression and ANOVA using R by Julian Faraway. Chapman & Hall/CRC, 2002
  • Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models by Julian Faraway. Chapman & Hall/CRC, 2006 (a 2016 online edition is also available from the library)
  • Data Mining: Practical Machine Learning Tools and Techniques. 4th Ed. Morgan Kaufmann by I.H. Witten, E. Frank, M.A. Hall, and C.J. Pal. (2016)
Edit Recommended Readings Content

Other Resources

Edit Other Resources Content

The material in the lectures may be extended by readings from various sources. These readings are part of the paper and may be assessed.

Both R and WEKA are available on the university computers in Lab 5 (R.G.12). However, if you would like your own copies of these packages, you are welcome to download them, free of charge.

The R statistical package can be downloaded from http://cran.stat.auckland.ac.nz/

WEKA can be downloaded from http://www.cs.waikato.ac.nz/~ml/weka/

Edit Other Resources Content

Online Support

Edit Online Support Content

All material provided for this paper can be accessed from Moodle. Moodle will also be the source of news concerning the paper. It is your responsibility to ensure that you have access to the STAT321 page on Moodle, and your contact details on Moodle are up-to-date. To access Moodle, go to http://elearn.waikato.ac.nz.

Edit Online Support Content

Workload

Edit Workload Content

Students should expect to spend a minimum of about 10 hours per week on this paper. This includes the 4 scheduled contact hours involving lectures, tutorials, and labs.

Edit Workload Content